ACCELERATED LINEARIZED BREGMAN METHOD 



BO HUANG 1 ", SHIQIAN MA*t, AND DONALD GOLDFARBt 
June 21, 2011 

Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (ALB) method for solving the basis 
pursuit and related sparse optimization problems. This accelerated algorithm is based on the fact that the linearized Bregman 
(LB) algorithm is equivalent to a gradient descent method applied to a certain dual formulation. We show that the LB method 
requires 0(l/e) iterations to obtain an e-optimal solution and the ALB algorithm reduces this iteration complexity to 0(l/y/e) 
while requiring almost the same computational effort on each iteration. Numerical results on compressed sensing and matrix 
completion problems are presented that demonstrate that the ALB method can be significantly faster than the LB method. 
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(1.1) min J(x) s.t. Ax = b, 



1. Introduction. In this paper, we are interested in the following optimization problem 



where A £ R mxn , b £ M. m and J(x) is a continuous convex function. An important instance of (jl.lj) is the 
so-called basis pursuit problem when J{x) :— \\x\\\ = Y^j=i \ x j\ : 

m 

(1.2) min ||a;||i s.t. Ax 

in _ 

Since the development of the new paradigm of compressed sensing [IT] , the basis pursuit problem fj 1 . 2 j) 
has become a topic of great interest. In compressed sensing, A is usually the product of a sensing matrix <1> 
and a transform basis matrix "A 7 and b is a vector of the measurements of the signal s = $x. The theory of 
compressed sensing guarantees that the sparsest solution (i.e., representation of the signal s = fyx in terms 
of the basis *f?) of Ax — b can be obtained by solving (| 1 .2(1 under certain conditions on the matrix <!> and the 
sparsity of x. This means that (|1.2p gives the optimal solution of the following NP-hard problem [21] : 



(1.3) min llxllo s.t. Ax = b. 

where ||x||o counts the number of nonzero elements of x. 

Matrix generalizations of (II. 3|) and ()1.2|) . respectively, are the so-called matrix rank minimization prob- 
lem 

(1.4) min rank(X) s.t. A(X) = d, 
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and its convex relaxation, the nuclear norm minimization problem: 



(1.5) min \\X\L s.t. A(X) = d, 

where A : R mx ™ — j. M. p is a linear operator, d £ R p , and (the nuclear norm of X) is defined as the sum 

of singular values of matrix X . A special case of (|1.4[) is the matrix completion problem: 

(1.6) ^min rank(X) s.t. X tj = My, V(i, j) £ CI, 
whose convex relaxation is given by: 

(1.7) min s.t. X iS = My , V(i, j) £ O. 

The matrix completion problem has a lot of interesting applications in online recommendation systems, 
collaborative filtering [35, 36 , etc., including the famous Netflix problem [33]. It has been proved that under 
certain conditions, the solutions of the NP-hard problems (|1.4[) and (|1.6[) are given respectively by solving 
their convex relaxations (|1.5p and (|1.7p . with high probability (see, e.g., [3"T1 151 [TU1 1501 ITS] ). 

The linearized Bregman (LB) method was proposed in [42] to solve the basis pursuit problem (jl.2[) . The 
method was derived by linearizing the quadratic penalty term in the augmented Lagrangian function that is 
minimized on each iteration of the so-called Bregman method introduced in [27j while adding a prox term 
to it. The linearized Bregman method was further analyzed in [SJ [71 HT] and applied to solve the matrix 
completion problem (|1.7p in [5]. 

Throughout of this paper, we will sometimes focus our analysis on the basis pursuit problem (jl.2p . 
However, all of the analysis and results can be easily extended to (|1.5p and (|1.7p . The linearized Bregman 
method depends on a single parameter /i > and, as the analysis in 0(7] shows, actually solves the problem 

(1.8) min (x) := + — 1| ar||f , s.t. Ax = b, 

iEK™ Z/J, 

rather than the problem (|1.2p . Recently it was shown in [U] that the solution to (|1.8p is also a solution 
to problem (| 1 . 2|) as long as n is chosen large enough. Furthermore, it was shown in [41 that the linearized 
Bregman method can be viewed as a gradient descent method applied to the Lagrangian dual of problem 
(|1.8p . This dual problem is an unconstrained optimization problem of the form 



(1.9) nun G M (y), 

where the objective function G fl (y) is differentiable since g^(x) is strictly convex (see, e.g., [32]). Motivated 
by this result, some techniques for speeding up the classical gradient descent method applied to this dual 
problem such as taking Barzilai-Borwein (BB) steps p], and incorporating it into a limited memory BFGS 
(L-BFGS) method [15], were proposed in [41] , Numerical results on the basis pursuit problem (ll.2p reported 
in [41) show that the performance of the linearized Bregman method can be greatly improved by using these 
techniques. 

Our starting point is also motivated by the equivalence between applying the linearized Bregman method 
to (ll.2p and solving the Lagrangian dual problem (|1.9p by the gradient descent method. Since the gradient of 
G>(y) can be shown to be Lipschitz continuous, it is well-known that the classical gradient descent method 
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with a properly chosen step size will obtain an e-optimal solution to (|1.9[) (i.e., an approximate solution y k 
such that Gfj,(y k ) — G M (y*) < e) in 0(l/e) iterations. In [23], Nesterov proposed a technique for accelerating 
the gradient descent method for solving problem of the form (|1.9[) (see, also, [24-), and proved that using this 
accelerated method, the number of iterations needed to obtain an e-optimal solution is reduced to 0(1/ ^fe) 
with a negligible change in the work required at each iteration. Nesterov also proved that the 0(1/ ^/e) 
complexity bound is the best bound that one can get if one uses only the first-order information. Based on 
the above discussion, we propose an accelerated linearized Bregman (ALB) method for solving f|l .8|) which 
is equivalent to an accelerated gradient descent method for solving the Lagrangian dual (|1.9p of (11.81) . As 
a by-product, we show that the basic and the accelerated linearized Bregman methods require 0(1/ e) and 
0(1/ ^Je) iterations, respectively, to obtain an e-optimal solution with respect to the Lagrangian for (|1.8[) . 

The rest of this paper is organized as follows. In Section [5] we describe the original Bregman iterative 
method, as well as the linearized Bregman method. We motivate the methods and state some previously 
obtained theoretical results that establish the equivalence between the LB method and a gradient descent 
method for the dual of problem (|1.8[) . We present our accelerated linearized Bregman method in Section [3] 
We also provide a theoretical foundation for the accelerated algorithm and prove complexity results for it 
and the unaccelerated method. In Section @J we describe how the LB and ALB methods can be extended 
to basis pursuit problems that include additional convex constraints. In Section [SJ we report preliminary 
numerical results, on several compressed sensing basis pursuit and matrix completion problems. These 
numerical results show that our accelerated linearized Bregman method significantly outperforms the basic 
linearized Bregman method. We make some conclusions in Section [51 

2. Bregman and Linearized Bregman Methods. The Bregman method was introduced to the 
image processing community by Osher et al. in [27] for solving the total-variation (TV) based image 
restoration problems. The Bregman distance [4] with respect to convex function J(-) between points u and 
v is defined as 

(2.1) D p j(u,v) :=J(u)-J(v)-(p,u-v), 

where p £ dJ(v), the subdiffcrential of J at v. The Bregman method for solving (|l.lj) is given below as 
Algorithm [T] Note that the updating formula for p k (Step 4 in Algorithm [1} is based on the optimality 
conditions of Step 3 in Algorithm [T] 

£ dJ(x k+l ) -p k + A T (Ax k+1 - b). 

This leads to 

p k+1 =p k - A T (Ax k+1 -b). 

It was shown in (27] 02] that the Bregman method (Algorithm [TJ converges to a solution of (|1.1| in a finite 
number of steps. 

It is worth noting that for solving , the Bregman method is equivalent to the augmented Lagrangian 
method [T71 [29l [33] in the following sense. 

Theorem 2.1. The sequences {x k } generated by Algorithm^ and by the augmented Lagrangian method, 
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Algorithm 1 Original Bregman Iterative Method 
1: Input: x° = p° = 0. 
2: for k = 0, 1,- • ■ do 

3: x k+1 — argmin-r D P j (x, x k ) + \\\Ax - b\\ 2 ; 
4: p k+1 = p k - A T (Ax k+1 - b); 
5: end for 



k+1 := &rgmm x J(x)-(\ k ,Ax-b) + ±\\Ax- 
k+i ._ X k _ ( Ax k+i _ fo) 



which computes for k — 0, 1, 

(2.2) { ; 

starting from A = are exactly the same. 

Proof. From Step 4 of Algorithm ED and the fact that p° = 0, it follows that p fc = - £)* =1 A 1 " (Ac? - 6). 
From the second equation in (|2.2p and using A = 0, we get A fc = — (-Aa^ ~ Thus, p k = A T X k for 
all k. Hence it is easy to see that Step 3 of Algorithm [T] is exactly the same as the first equation in (|2.2[> 
and that the x k+1 computed in Algorithm 1 and (I2.2p are exactly the same. Therefore, the sequences {x k } 
generated by both algorithms are exactly the same. □ 

Note that for J(x) :— a||a;||i, Step 3 of Algorithm Q] reduces to an ^i-regularized problem: 

(2.3) min Gjlldli - (p k , x) + -\\Ax - bf. 

x 2 

Although there are many algorithms for solving the subproblem (I2.3[) such as FPC [TB], SPGL1 [33], FISTA 
[2] etc., it often takes them many iterations to do so. The linearized Bregman method was proposed in [42], 
and used in [28j [3 |6] to overcome this difficulty. The linearized Bregman method replaces the quadratic 
term — b\\ 2 in the objective function that is minimized in Step 3 of Algorithm [T] by its linearization 

(A T (Ax k — b),x) plus a proximal term j^\\x ~ x k \\ 2 . Consequently the updating formula for p k is changed 
since the optimality conditions for this minimization step become: 

e 3J{x k+1 ) - p k + A T (Ax k - 6) + -{x k+1 -x k ). 

M 

In Algorithm [5] below we present a slightly generalized version of the original linearized Bregman method 
that includes an additional parameter r that corresponds to the length of a gradient step in a dual problem. 

Algorithm 2 Linearized Bregman Method 
1 

2 
3 
1 
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Input: x° = p° = 0, fi > and t > 0. 
for k = 0, 1, • • • do 

x k+l = arg minx D P k (j. ]3 ,*) + T ^T ( Ax k _ b ) 5X ) + ^_|| a ._ a; fc||2. 



p k+i =p k _ T A T (Ax k - b) - i(x fc+1 - x k ); 
end for 



In [41], it is shown that when /x||A|| 2 < 2, where ||A|| denotes the largest singular value of A, the 
iterates of the linearized Bregman method (Algorithm 2 with r = 1) converge to the solution of the following 
regularized version of problem (jl.ljl : 

(2.4) min J(x) + -^-\\x\\ 2 s.t. Ax = b. 

x 2p 



We prove in Theorem 12.31 below an analogous result for Algorithm 2 for a range of values of r. However, we 
first prove, as in |41) . that the linearized Bregman method (Algorithm 2) is equivalent to a gradient descent 
method 



(2-5) y k+1 :=y k -rS7G,(y k ) 
applied to the Lagrangian dual 

maxmin{ J(w) H ||"H| 2 — {Vi Aw — b)} 

V w 2fJL 

of (|2.4p . which we express as the following equivalent minimization problem: 
(2.6) min G^y) := ~{J(w*) + ^\\w*\\ 2 - (y, Aw* - b)}, 

where 

w* := argmin{ J(w) + — \\w\\ 2 — (y, Aw — b)}. 
To show that G M (y) is continuously differentiable, we rewrite G l _ l (y) as 

G,(y) = -^>^A T y) + | \\A T y\\ 2 b T y, 

where 

^(u) ee min{J(u;) + ^-\\w - v\\ 2 } 

is strictly convex and continuously differentiable with gradient V$ M (w) = iL ^ L , and w — aigmm w {J(w) + 
j^\\w — v\\ 2 } (e.g., see Proposition 4.1 in 3 ). From this it follows that VG M (y) = Aw* — b. Hence the 
gradient method (|2.5[) corresponds to Algorithm [3] below. 

Algorithm 3 Linearized Bregman Method (Equivalent Form) 



Input: /i > 0, t > and y° = rb. 
for k = 0, 1, • • ■ do 



jk+l 

y k+i ._ y k _ T (A w k+i _ b y 



<r" ' := argrnin«,{J(u/) + ^j||w|| 2 - (y k , Aw - b)}; 



end for 



Lemma [2.21 and Theorem 12.31 below generalize Theorem 2.1 in [41] by allowing a step length choice in 
the gradient step (|2.5|) and show that Algorithms [2] and [3] are equivalent. Our proof closely follows the proof 
of Theorem 2.1 in [4Tj . 

Lemma 2.2. x k+1 computed by Algorithm^ equals w k+1 computed by Algorithm^ if and only if 

(2.7) A T y k =p k -rA T (Ax k -b) + -x k . 

A* 

Proof. By comparing Step 3 in Algorithms [2] and El it is obvious that w k+1 is equal to x k+1 if and only 
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if (J221) holds. □ 

Theorem 2.3. The sequences {x k } and {w k } generated by Algorithms^ and{3\ are the same. 

Proof. We prove by induction that equation (12.71) holds for all k > 0. Note that (j2.7l) holds for k = 
since p° = x° = and y° — rb. Now let us assume that (12 . T[) holds for all < k < n — 1; thus by Lemma [2~2 
w k+1 — x k+1 for all < k < n — 1. By iterating Step 4 in Algorithm [3] we get 

n 

(2.8) y n = y 11 - 1 - r{Aw n - b) = - £ t(Ab» - 6). 

j=0 



By iterating Step 4 in Algorithm [5] we get 



n-l 

■^2rA T (Ax j -b) 

3=0 



1 r 



which implies that 

1 fc 

p" - rA T (yla;" - b) + -x 11 = - tA t {Ax j -b)=A T y r \ 

M 3=0 

where the last equality follows from (I2.8I) ; thus by induction (|2.7[) holds for all k > 0, which implies by 
Lemma I2T21 that x k — w k for all k > Q. □ 

Before analyzing Algorithms[5]and[31 we note that by defining v k = A T y k and algebraically manipulating 
the last two terms in the objective function in Step 3 in Algorithm [3J Steps 3 and 4 in that algorithm can 
be replaced by 



(2.9) 



w k+1 := argmim u J(w) + j^\\w — /j,v k \\ 2 



v 



k+1 — ,,k 



rA T (Aw k+1 -b) 



if we set v° = rA T b. Because Algorithms [2] and [3] are equivalent, convergence results for the gradient descent 
method can be applied to both of them. Thus we have the following convergence result. 

Theorem 2.4. Let J(w) = ||to||i. Then G M (y) in the dual problem (|2.6[) is continuously differ •entiable 
and its gradient is Lipschitz continuous with the Lipschitz constant L < {j,\\A\\ 2 . Consequently, if the step 
length t < ^jojp > the sequences {x k } and {w k } generated by Algorithms^ and[B converge to the optimal 
solution of (|2.4p . 

Proof. When J(x) = \\x\\\, w k+1 in ()2 .9|) reduces to 

w k+1 = [i ■ shrink(w fc , 1), 
where the t\ shrinkage operator is defined as 

(2.10) shrinked) := sgn(z) o max{|z| - a, 0}, Vz E K", a > 0. 

G^(y) is continuously differentiable since <7/i(x) is strictly convex. Since for any point y, VG A1 (y) = Aw — b, 
where w = [i ■ shrink(A T y, 1), it follows from the fact that the shrinkage operator is non-expansive, i.e., 



||shrink(s, a) — shrink(i, a)\\ < \\s — t\\, \fs,t,a 
6 



that 



HVG^y 1 ) - VG M (j/ 2 )|| = • A ■ shrink^V, 1) - /x • A ■ shrink^V, 1)|| 

< /xii^n 2 !^ 1 - 2/ 2 ||, 

for any two points y 1 and y 2 . Thus the Lipschitz constant L of VG M (-) is bounded above by ^||A|| 2 . 

When r < ^jpjp, we have tL < 2 and thus |1 — tL\ < 1. It then follows that the gradient descent 
method y k+1 = y — T\7G fJl (y k ) converges and therefore Algorithms [2] and [3] converge to a;*, the optimal 
solution of (|2l| . □ 

Before developing an accelerated version of the LB algorithm in the next section. We would like to 
comment on the similarities and differences between the LB method and Nesterov's composite gradient 
method [26] and the ISTA method [2] applied to problem Ijl.ljl and related problems. The latter algorithms 
iterate Step 3 in the LB method (Algorithm [2]) with p k = 0, and never compute or update the subgradient 
vector p k . More importantly, their methods solve the unconstrained problem 

min ||x|| 1 + ^||Ax-6|| 2 . 

igl" 2/x 

Hence, while these methods and the LB method both linearize the quadratic term ||Ar — b\\ 2 while handling 
the nonsmooth term directly, they are very different. 

Similar remarks apply to the accelerated LB method presented in the next section and fast versions of 
ISTA and Nesterov's composite gradient method. 

3. The Accelerated Linearized Bregman Algorithm. Based on Theorem l2.3[ i.e., the equivalence 
between the linearized Bregman method and the gradient descent method, we can accelerate the linearized 
Bregman method by techniques used to accelerate the classical gradient descent method. In [JT], Yin con- 
sidered several techniques such as line search, BB step and L-BFGS, to accelerate the linearized Bregman 
method. Here we consider the acceleration technique proposed by Nesterov in [53] . This technique accel- 
erates the classical gradient descent method in the sense that it reduces the iteration complexity significantly 
without increasing the per-iteration computational effort. For the unconstrained minimization problem (jl.9p . 
Nesterov's accelerated gradient method replaces the gradient descent method (|2.5[) by the following iterative 
scheme: 

x k+1 := y k ~T\7G^{y k ) 
y k+1 := a k x k+1 + (1 - a k )x k , 

where the scalars a k are specially chosen weighting parameters. A typical choice for a k is a k = jfrg. If t 
is chosen so that r < 1/L, where L is the Lipschitz constant for VG M (-), Nesterov's accelerated gradient 
method (|3 . 1 [) obtains an e-optimal solution of (|1.9[) in 0(1/ ^fe) iterations, while the classical gradient method 
(|2.5|) takes 0(1/ e) iterations. Moreover, the per-iteration complexities of (|2.5I) and (I3.1[) are almost the same 
since computing the gradient VG M (-) usually dominates the computational cost in each iteration. Nesterov's 
acceleration technique has been studied and extended by many others for nonsmooth minimization problems 
and variational inequalities, e.g., see [25l [26j [2j [38l [22j [P2| [T3 | fT4 ] . 

Our accelerated linearized Bregman method is given below as Algorithm^ The main difference between 
it and the basic linearized Bregman method (Algorithm [2]) is that the latter uses the previous iterate x k 
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and subgradient p k to compute the new iterate x k+1 , while Algorithm 2] uses extrapolations x k and p k that 
are computed as linear combinations of the two previous iterates and subgradients, respectively. Carefully 
choosing the sequence of weighting parameters {a k } guarantees an improved rate of convergence. 

Algorithm 4 Accelerated Linearized Bregman Method 
1: Input: x° = x° = p° = p° = 0, \i > 0, r > 0. 
2: for k = 0, 1, • • • do 

3: x k+1 = &rgmm x Df (x,x k ) + r(A T (Ax k - b),x) + ±\\x - x k \\ 2 ; 
4: p k+l = p k - rA T (Ax k - b) - j I (x k+1 - i k ); 
5: x k+1 = a k x k+1 + (1 - a k )x k ; 
6: p k+1 = a k p k+1 + (1 - a k )p k - 
7: end for 



In the following, we first establish the equivalence between the accelerated linearized Bregman method 
and the corresponding accelerated gradient descent method (|3.1|) . which we give explicitly as (|3.2j) below 
applied to the dual problem (|2.6[) . Based on this, we then present complexity results for both basic and 
accelerated linearized Bregman methods. Not surprisingly, the accelerated linearized Bregman method 
improves the iteration complexity from 0(l/e) to 0(l/y/e). 

Theorem 3.1. The accelerated linearized Bregman method (Algorithm^) is equivalent to the accelerated 
dual gradient descent method (13. 2j) starting from y° = y° = rb: 

= &r g mm J (w) + jj2\\w\\ 2 — (y k , Aw — b) 
= y k - T(Aw k+1 - b) 
= a k y k+1 + (1 - a k )y k . 

More specifically, the sequence {x k } generated by Algorithm [J] is exactly the same as the sequence {w k } 
generated by (|3.2j) . 

Proof. Note that the Step 3 of Algorithm [4] is equivalent to 

(3.3) x k+1 := argminJ(x) - (p k ,x) + t(A t (Ai k - b).x) + —\\ x - x k \\ 2 . 
Comparing Q3.3P with the first equation in (I3.2j) . it is easy to see that x k+1 = w k+1 if and only if 

(3.4) A T y k =p k + TA T (b- Ax k ) + -x k . 

A* 

We will prove (|3.4|) in the following by induction. Note that (|3.4j) holds for k = since y° = rb and 
x° = p° = 0. As a result, we have x 1 = to 1 . By defining w° = 0, we also have x° — to , 

(3.5) A T y 1 = A T (a y 1 + (1 - a )A T y°) = a„A T y° + a rA T (b - Aw 1 ) + (1 - a )A T y°. 
On the other hand, 

(3.6) p 1 =p° + rA T (b - Ax°) --{x 1 - x°) = A T y° - -x\ 

where for the second equality we used (|3.4p for k = 0. Expressing p 1 and i 1 in terms of their affine 
combinations of p 1 , p°, x 1 and x°, then substituting for p 1 using (|3.6[> and using the fact that x° = p° = 0, 
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(3.2) 



r.k+1 



and finally using y° — rb and (|3.5[) . we obtain, 

p 1 + rA T (b - Ax 1 ) + -x 1 = aop 1 + (1 - a )p° + a TA T (b - Ax 1 ) + (1 - a )rA T (b - Ax°) + -{a^x 1 + (1 - a )x°) 

[A [L 

= a (A T y° x 1 ) + a TA T (b - Ax 1 ) + (1 - a Q )TA T b + -uqx 1 

= a Q A T y° + a TA T {b - Ax 1 ) + (1 - a )A T y° 
= a Q A T y° + a TA T (b - Aw 1 ) + (1 - a )A T y° 
= A^y 1 . 

Thus we proved that (|3.4[) holds for k = 1. Now let us assume that (|3.4j) holds for < k < n — 1, which 
implies x k = w k , VO < k < n since a; — w°. We will prove that Q3.4p holds for k = n. 

First, note that 

(3.7) p n = p' 1 - 1 + rA T (b - Ax"- 1 ) - -(x n - x^ 1 ) = A T y n - 1 - -x n , 

where the first equality is from Step 4 of Algorithm 0] and the second equality is from (|3.4p for k = n — 1. 
From Step 6 of Algorithm |4] and (|3.7|) , we have 

p n = a n -ip n + (1 - an-Op"- 1 

(3.8) = a„_ 1 (A T y"- 1 -i a; ") + (l-a„_ 1 )(A T y^ 2 -^"- 1 ) 

= ^-i^y"- 1 + (1 - a n ^)A T y n ~ 2 - ±x n , 

where the last equality uses Step 5 of Algorithm On the other hand, from (|3.2[) we have 

A T y n = A T (a„_ l2 /" + (l-a„_ 1 )y™- 1 ) 

= a n ^ 1 A T {y n - 1 +r(b~Aw n )) + (l-a n ^ 1 )A T {y n - 2 +T{b-Aw n - 1 )) 

= a^A 1 ^- 1 + (1 - a„_i)A T y™~ 2 + tA t [6 - A(a n ^ 1 x n + (1 - a^x"" 1 )] 

= a^i^y"- 1 + (1 - a„_i)A T y"- 2 + tA t {b - Ai n ), 



where the third equality is from w n = x n and w n 1 = x n 1 , the last equality is from Step 5 of Algorithm [4] 
Combining (j3~g|) and (pH)f we get that (pH)) holds for fc = n. □ 

Like the linearized Bregman, we can also use a simpler implementation for accelerated linearized Bregman 
method in which the main computation at each step is a proximal minimization. Specifically, (|3.2p is 
equivalent to the following three steps. 



„fc+i 



(3.10) 



ir"'' := argmin J(w) + j^\\w — ^j,v k \ 2 
ti k - tA t (Aw k+1 -b) 
a k v k+1 + (1 - a k )v k 



As before this follows from letting v k — A T y k and v k = A T y k and completing the square in the objective 
function in the first equation of p. 21) . 

Next we prove iteration complexity bounds for both basic and accelerated linearized Bregman algorithms. 
Since these algorithms are standard gradient descent methods applied to the Lagrangian dual function and 
these results have been well established, our proofs will be quite brief. 

9 



Theorem 3.2. Let the sequence {x k } be generated by the linearized Bregman method (Algorithm^) 
and (x*,y*) be the pair of optimal primal and dual solutions for Problem (|2.4p . Let {y k } be the sequence 
generated by Algorithm^ and suppose the step length t < ^, where L is the Lipschitz constant for VG M (y). 
Then for the Lagrangian function 

(3.H) Cpfav) = J(x) + ^\\x\\ 2 -(y,Ax-b), 

we have 

(3.12) C^x*,y*)-C^x k +\y k )< h * ^ . 

Thus, if we further have r > (3/L, where < /3 < 1, then (x k+1 ,y k ) is an e-optimal solution to Problem 
(|2.4|) with respect to the Lagrangian function if k > \C/e], where C :— L ^ y ' 2 ^ v - . 
Proof. From (|2.6j) we get 

(3.13) G,(y k ) = -C,(x k +\y k ). 

By using the convexity of function G^(-) and the Lipschitz continuity of the gradient VG^(-), we get for 
any y, 



G^y k ) - G M (y) < G^/" 1 ) + (VG,^ 1 ),^ - y k ~ x ) + § \\y k - y^f - G^y) 

(3.14) 



< G^y*- 1 ) + (VG^- 1 ),^ - y^ 1 ) + ±\\y k - y^f - G^y) 

< (VG^^- 1 ),^" 1 - y) + (VG^" 1 ), y k - y^ 1 ) + ±\\y k - y^f 



= (VG^y k - l ),y k -y) + ±\\y k - y k ~Y 

= ny k - 1 -y\y k -y) + Uy k -y k ~ 1 W 2 
< M\\y~y k ^W 2 -\\y-y k \\ 2 )- 

Setting y = y k ~ l in (|3.14[) . we obtain G^{y k ) < G A1 (y fc_1 ) and thus the sequence {G ll {y k )} is non-increasing. 
Moreover, summing (|3.14l) over k — 1, 2, . . . , n with y = y* yields 

n(G,(y n ) - G,(y*)) < £(G M (y fc ) - G,(y*)) < f (\\y* y°\\ 2 \\y* y n \\ 2 ) < ±\\y* y°\\ 2 , 

k=l T T 

and this implies (|3.12[) . □ 

Before we analyze the iteration complexity of the accelerated linearized Bregman method, we introduce 
a lemma from [38 that we will use in our analysis. 

Lemma 3.3 (Property 1 in [38 ). For any proper lower semicontinuous function ip : M™ —¥ (— oo,+oo] 
and any z € R" , if 

z+ = argmin{-0(x) + ~\\x - z|| 2 }, 

X Z 



then 



+ \\\ x ~ z \\ 2 > */>(*+) + - 4 2 + - z+f, Vx e 
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The following theorem gives an iteration-complexity result for the accelerated linearized Bregman method. 
Our proof of this theorem closely follows the proof of Proposition 2 in [38] . 



Theorem 3.4. Let the sequence {x k } be generated by accelerated linearized Bregman method (Algorithm 
[Jp and (x*,y*) be the optimal primal and dual variable for Problem (|2.4[) . Let {a k } be chosen as 

(3.15) a k ^ = l + 8 k (0-\-l), 
where 

(3.16) 0_i := 1, and 9 k = -r^-r,Vfc > 0. 

k + 2 

Let the sequence {y k } be defined as in (|3.2j) and the step length t < r, where L is the Lipschitz constant of 
VG M (y) and G M (-) is defined by (|3.13p . We have 

Oil,,* _ ,,0 II 2 

(3.17) G„{y k ) - G„{y*) < W V 11 



rfc 2 

Thus, if we further have r > /3/£, where < /3 < 1, then (x k+1 ,y k ) is an e-optimal solution to Problem 

i — 
T 



(|2.4p with respect to the Lagrangian function (|3.11|) if k > C / e] , where C := 2L ^ V y U , 



Proof. Let 

(3.i8) «* = »*- 1 + C l (w*-y fc " 1 ) 

and denote the linearization of G^ (y) as 

(3-19) l G .(x;y):=G fl (y) + (VG l ,(y),x-y)<G fl (x). 

Therefore the second equality in (|3.2p is equivalent to 



y k+1 :=argniin G M (y fe ) + (VG M (y fc ), y - + - y k \\ 2 
y At 

r.k\ i 1 n„, -fe||2 



argmin l G (y; y ') + —\\y - y 
v It 
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Define y k := (1 - 6 k )y k + 8 k y*, we have 



(3.20) 

G^(y k+1 ) < G,(y k ) + (WG^y k ),y k+1 - y k ) + ^\\y k+1 - y k \\ 2 

<iGAy k+1 ;y k ) + ^h k+1 ~y k \\ 2 

< i G M;v k ) + ^\\y k - v k W 2 ^\\y k - y k+1 W 2 

= IgM - k )y k + 9 k y*;y k ) + i-||(l - 6 k )y k + 6 k y* - y k \\ 2 - i||(l - 6 k )y k + 6 k y* - y k + l f 
= l G J(l - 9 k )y k + 6 k y*;y k ) + %-\\y* + e~ k \y k - y k ) - y k f - ^\\y* + 6^(y k - y k+1 ) - y k f 
= i G(i ((l - 9 k )y k + 9 k y*-y k ) + ^\\y* - z k \\ 2 - ^\\y* - z k+1 \\ 2 
= (1 - e k )l G ^(y k ;y k ) + 9 k l G ^y*;y k ) + &\\y* - z k \\ 2 - ^\\y* - z k+1 \\ 2 

< (1 - k )G,(y k ) + 9 k G,(y*) + &\\y*- z k \\ 2 - ^\\y* - z k+1 \\ 2 , 

where the second inequality is from p. 191) and r < 1/L, the third inequality uses Lemma |3~B1 with ip(x) := 
rl Gfi (x;y k ), the third equality uses (|3.18p . (|3.2[) and p. 151) and the last inequality uses p,19[) . 

Therefore we get 

±(G,(y k+1 ) - G,(y*)) < ^^(G,(y k ) - G^y*)) + ^\\v~ ~ ^\\v ~ ^ +1 || 2 - 
From p~To) . it is easy to show that < -J— for all k > 0. Thus (|3T20|) implies that 

k k — 1 

(3.21) Iz^ti (G^+i) _ G^(y*)) < ±^(G»(y k ) - + i-|| y - z fc || 2 - z* +1 || 2 . 

Summing p.21[) over fc = 0, 1, . . . , n — 1, we get 

^(<W) - < - z°|| 2 = i-Hy- - y °|| 2 , 

which immediately implies p.!7[) . □ 

Remark 3.5. The proof technique and the choice of 8 k used here are suggested in '381 for accelerating 
the basic algorithm. Other choices of 8 k can be found in \23i \24\ [H \38jj . They all work here and give the 
same order of iteration complexity. 



4. Extension to Problems with Additional Convex Constraints. We now consider extensions 
of both the LB and ALB methods to problems of the form 

(4.1) min J (a;) s.t Ax = 6, 

12 



where X is a nonempty closed convex set in R™. It is not clear how to extend the LB and ALB methods 
(Algorithms [2] and QJ to problem (|4.1|) since we can no longer rely on the relationship 

e d.J(x k+1 ) -p k + A T (Ax k ~b) + -{x k+1 - x k ) 

to compute a subgradient p k+1 £ dJ(x k+1 ). Fortunately, the Lagrangian dual gradient versions of these 
algorithms do not suffer from this difficulty. All that is required to extend them to problem (|4.1|) is to 
include the constraint w £ X in the minimization step in these algorithms. Note that the gradient of 

&ii( v ) = min{J(ii;) + — \\w — v\\ 2 } 

remains the same. Also it is clear that the iteration complexity results given in Theorems 13.21 and 13.41 apply 
to these algorithms as well. 

Being able to apply the LB and ALB methods to problems of the form of (|4.1I) greatly expands their 
usefulness. One immediate extension is to compressed sensing problems in which the signal is required to 
have nonnegative components. Also (|4.ip directly includes all linear programs. Applying the LB and ALB 
to such problems, with the goal of only obtaining approximated optimal solutions, will be the subject of a 
future paper. 

5. Numerical Experiments. In this section, we report some numerical results that demonstrate 
the effectiveness of the accelerated linearized Bregman algorithm. All numerical experiments were run in 
MATLAB 7.3.0 on a Dell Precision 670 workstation with an Intel Xeon(TM) 3.4GHZ CPU and 6GB of 
RAM. 

5.1. Numerical Results on Compressed Sensing Problems. In this subsection, we compare the 
performance of the accelerated linearized Bregman method against the performance of the basic linearized 
Bregman method on a variety of compressed sensing problems of the form (|1.2[) . 

We use three types of sensing matrices A £ M. mxn . Type (i): A is a standard Gaussian matrix generated 
by the randn(m, n) function in MATLAB. Type (ii): A is first generated as a standard Gaussian matrix and 
then normalized to have unit-norm columns. Type (iii): The elements of A are sampled from a Bernoulli 
distribution as either +1 or —1. We use two types of sparse solutions x* £ R n with sparsity s (i.e., the 
number of nonzeros in x*). The positions of the nonzero entries of x* are selected uniformly at random, and 
each nonzero value is sampled either from (i) standard Gaussian (the randn function in MATLAB) or from 
(ii) [—1,1] uniformly at random (2 * rand — 1 in MATLAB). 

For compressed sensing problems, where J(x) — \\x\\i, the linearized Bregman method reduces to the 
two-line algorithm: 

x k+1 := u ■ shrink(w fc , 1) 
v k+1 := v k +rA T {b- Ax k+1 ), 

where the i\ shrinkage operator is defined in (|2.10p . Similarly, the accelerated linearized Bregman can be 
written as: 

a ■ shrink(-D fe , 1) 
i k + rA T (b- Ax k+1 ) 
a k v k+1 + (1 - a k )v k . 
13 



c fc+l 
,fc+l 



Both algorithms are very simple to program and involve only one Ax and one A T y matrix- vector multipli- 
cation in each iteration. 

We ran both LB and ALB with the seed used for generating random number in MATLAB setting as 0. 
Here we set n = 2000, m — 0.4 x n,s — 0.2 x m, [i = 5 for all data sets. We set r = npnri ■ We terminated 
the algorithms when the stopping criterion 

(5.1) ||AE fc -6||/||&||<10- 5 

was satisfied or the number of iterations exceeded 5000. Note that (|5.ip was also used in [31]. We report 
the results in Table [57X1 



Table 5.1 

Compare linearized Bregman (LB) with accelerated linearized Bregman (ALB) 



Standard Gaussian matrix A 


Number of Iterations 


Relative error \\x — x*\\/\\x*\\ 


Type of x* 


n(m = 0.4n, s = 0.2m) 


LB 


ALB 


LB 


ALB 


Gaussian 


2000 


5000+ 


330 


5.1715e-3 


1.4646e-5 


Uniform 


2000 


1681 


214 


2.2042e-5 


1.5241e-5 


Normalized Gaussian matrix A 


Number of Iterations 


Relative error \\x — x* / \\x*\\ 


Type of x* 


n(m = 0.4n, s = 0.2m) 


LB 


ALB 


LB 


ALB 


Gaussian 


2000 


2625 


234 


3.2366e-5 


1.2664e-5 


Uniform 


2000 


5000+ 


292 


1.2621e-2 


1.5629e-5 


Bernoulli +1/-1 matrix A 


Number of Iterations 


Relative error \\x — x*\\/\\x*\\ 


Type of x* 


n(m = 0.4n, s = 0.2m) 


LB 


ALB 


LB 


ALB 


Gaussian 


2000 


2314 


222 


4.2057e-5 


1.0812e-5 


Uniform 


2000 


5000+ 


304 


1.6141e-2 


1.5732e-5 



In Table 15.11 we see that for three out of six problems, LB did not achieve the desired convergence 
criterion within 5000 iterations, while ALB satisfied this stopping criterion in less than 330 iterations on 
all six problems. To further demonstrate the significant improvement the ALB achieved over LB, we plot 
in Figures 15.11 15.21 and 15.31 the Euclidean norms of the residuals and the relative errors as a function of the 
iteration number that were obtained by LB and ALB applied to the same data sets. These figures also depict 
the non-monotonic behavior of the ALB method. 



5.2. Numerical Results on Matrix Completion Problems. There are fast implementations of 
linearized Bregman [5] and other solvers [201 133 EH EU] for solving matrix completion problems. We do not 
compare the linearized Bregman and our accelerated linearized Bregman algorithms with these fast solvers 
here. Rather our tests are focused only on comparing ALB with LB and verifying that the acceleration 
actually occurs in practice for matrix completion problems. 

The nuclear norm matrix completion problem (ll.7|) can be rewritten as 

(5.2) min ||X||* s.t. Vq(X)=Vq{M), 

where [P^X)]^ = X, t j if e f2 and [Pn(X)]ij = otherwise. When the convex function J(-) is the 
nuclear norm of matrix X, the Step 3 of Algorithm [2] with inputs X k , P k can be reduced to 

(5.3) X k+1 := arg min fi\\X\\* + - (X k - ^TV n {V n X k - Vn(M)) - P k ))\\ 2 F - 

X£l rax " I 
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LB 
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200 400 600 800 1000 1200 1400 1600 1800 
Relative Errors 



LB 
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200 400 600 800 1000 1200 1400 1600 1800 



Fig. 5.1. Gaussian matrix A, Left: Gaussian x* , Right: Uniform x* 
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Fig. 5.2. Normalized Gaussian matrix A, Left: Gaussian x* , Right: Uniform x* 
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Fig. 5.3. Bernoulli matrix A, Left: Gaussian x* , Right: Uniform x* 
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It is known (see, e.g., [51 [20]) that (|5.3p has the closed-form solution, 



X k+1 = Shrink(X fe - n{rVn{VnX k - V n {M)) - P k ), /*), 

where the matrix shrinkage operator is defined as 

Shrink(y, 7) := C/Diag(max(cr - 7, 0))V T , 

and Y = UDia.g(a)V T is the singular value decomposition (SVD) of matrix Y. Thus, a typical iteration 
of the linearized Bregman method (Algorithm [2]) , with initial inputs X Q = P° = 0, for solving the matrix 
completion problem (|5.2[) can be summarized as 



(5.4) 



X k+1 := Shrmk{X k - ^{ T Vn(VnX k -Vn(M))- P k ),fi) 
P k+i ._ pk _ T (jD nX k - VnM) - (X k+1 - X k )/n. 



Similarly, a typical iteration of the accelerated linearized Bregman method (Algorithm 2]), with initial inputs 
X° — P° — X° = P° = 0, for solving the matrix completion problem (|5.2[) can be summarized as 



(5.5) 



X k+i 

pk+1 

X k+1 
pk+1 



= Shrink(Jf fc - fi(TV n (VnX k - Vn{M)) - P k ), 

= P k - r(VnX k - VnM) - (X k+1 - A fc )//i 

= a k X k+1 + (1 - a k )X k 

= a k P k+1 + {l-a k )P k , 



where the sequence a k is chosen according to Theorem 13.41 

We compare the performance of LB and ALB on a variety of matrix completion problems. We created 
matrices M £ M. nxn with rank r by the following procedure. We first created standard Gaussian matrices 
M L e R nxr and Mr, € W ixr and then we set M = M L M^. The locations of the p known entries in M were 
sampled uniformly, and the values of these p known entries were drawn from an iid Gaussian distribution. 
The ratio p/n 2 between the number of measurements and the number of entries in the matrix is denoted 
by "SR" (sampling ratio). The ratio between the dimension of the set of n x n rank r matrices, r(2n — r), 
and the number of samples p, is denoted by "FR". In our tests, we fixed FR to 0.2 and 0.3 and r to 10. 
We tested five matrices with dimension n = 100, 200, 300, 400, 500 and set the number p to r(2n — r)/FR. 
The random seed for generating random matrices in MATLAB was set to 0. /i was set to 5n (a heuristic 
argument for this choice can be found in [5]). We set the step length r to l//i since for matrix completion 
problems \\Vn\\ = 1. We terminated the code when the relative error between the residual and the true 

S i-e-, 



(5.6) \\Vn{X k ) - Vn{M)\\ F /\\Vn(M)\\ F < 10" 4 . 

Note that this stopping criterion was used in [5]. We also set the maximum number of iteration to 2000. 



We report the number of iterations needed by LB and ALB to reach (|5.6[) in Table 15.21 Note that 
performing the shrinkage operation, i.e., computing an SVD, dominates the computational cost in each 
iteration of LB and ALB. Thus, the per-iteration complexities of LB and ALB are almost the same and it 
is reasonable to compare the number of iterations needed to reach the stopping criterion. We report the 
relative error err := \\X k — M\\f/\\M\\f between the recovered matrix X k and the true matrix M in Table 
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Fig. 5.4. Comparison of LB and ALB on matrix completion problems with rank = 10, FR = 0.2 

15.21 We see from Table 15.21 that ALB needed significantly fewer iterations to meet the stopping criterion 

In Figures 15.41 and 15. 5[ we plot the Frobenius norms of the residuals and the relative errors obtained by 
LB and ALB for iteration 1-500 for the tests involving matrices with dimension n — 200,300,400 and 500. 
Note that the non-monotonicity of ALB is far less pronounced on these problems. 



Table 5.2 

Comparison between LB and ALB on Matrix Completion Problems 





FR = 0.2, rank = 10 


FR = 0.3, rank = 10 


11 


SR 


iter-LB 


err-LB 


iter-ALB 


crr-ALB 


SR 


itcr-LB 


err-LB 


iter-ALB 


err-ALB 


100 


0.95 


85 


1.07e-4 


63 


l.lle-4 


0.63 


294 


1.75e-4 


163 


1.65e-4 


200 


0.49 


283 


1.62e-4 


171 


1.58e-4 


0.33 


1224 


3.76e-4 


289 


1.83e-4 


300 


0.33 


466 


1.64e-4 


261 


1.60e-4 


0.22 


2000+ 


3.59e-3 


406 


1.93e-4 


400 


0.25 


667 


1.79e-4 


324 


1.65e-4 


0.17 


2000+ 


1.12c-2 


455 


1.80e-4 


500 


0.20 


831 


1.76e-4 


398 


1.65e-4 


0.13 


2000+ 


3.14c-2 


1016 


7.49e-3 
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Relative Error (n=200) 



Relative Error (n=300) 




100 200 300 400 




Relative Error (n=400) 



Relative Error (n=500) 
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Residual (n=400) 
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Residual (n=500) 



100 200 300 400 




Fig. 5.5. Comparison of LB and ALB on matrix completion problems with rank = 10, FR = 0.3 



6. Conclusions. In this paper, we analyzed for the first time the iteration complexity of the linearized 
Bregman method. Specifically, we show that for a suitably chosen step length, the method achieves a value 
of the Lagrangian of a quadratically regularized version of the basis pursuit problem that is within e of the 
optimal value in 0(1/ e) iterations. We also derive an accelerated version of the linearized Bregman method 
whose iteration complexity is reduced to 0(l/y / e) and present numerical results on basis pursuit and matrix 
completion problems that illustrate this speed-up. 
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